The R console looks like this:
Make sure that you set up a folder for this class.
You can knit the file. The first time you do this you will need to make sure you have the knitr package installed. You have the option to knit into .html, .pdf, and .doc. In general, in this course we will be knitting into .html.
To make something “code-looking” we use the grave accent ` found in the upper left of your keyboard.
To create a header, place a hash tag at the start of the line. For example, # Header 1 or create a level 2 header using ## Header Level 2.
To make text italics put asterisk around the text *like this*. To make text bold, put two asterisks around the text **like this**.
To make a list, just start creating your list using a - or * for each bullet, like this:
- list item 1
- list item 2
It is important that there is a blank line before the first bullet.
Add a link with the follwing code:
[Alt text that will display](www.google.com)
It will display like this:
Add an image with the following code:

It will display like this:
Alt text
The vast majority of markdown syntax are available in the RStudio RMarkdown Cheatsheet, Section 3.
Create an R chunk:
2+2
## [1] 4
OR
x<-4
echo=T or echo=F– determines whether or not to echo the source code in the output file. This can be useful if you are creating a document for someone to read that doesn’t need to see or doesn’t want to see you code, just the output. In general in this course for assignments I would like your code to be echoed. The default is echo=F.
results=T or results=F – determines whether or not the results will be displayed. This can be useful if you want to show code, but don’t care what the output is. The default is eval=T.
eval=T or eval=F – determines whether or not to evaluate the code. This can be useful if you have a whole chunk of code you don’t want run, but you also don’t want to. The default is eval=T.
There are many, many more options including fig.width, fig.height, cache, etc. The vast majority of options are available in the RStudio RMarkdown Cheatsheet, Section 5.
You have the option to set the options individually on each chunk and/or set the global options by using the code knitr::opts_chunk$set(your options here)) in the first code chunk.
Rather than using a code chunk (which is centered in the middle of the page), you also have to options to use inline code. You can place the following within any sentence or paragraph.
`r codehere`
For example,
This is the number `r x`.
becomes… This is the number 4.
Packages can contain lots of things including: data sets, functions, etc.
You can install packages using the packages tab or you can use the code install.packages('packageyouwant') in the console.
In each new R session where you want to use the package you will have to load it by typing library('packageyouwant') in the console (or in the RMarkdown document - more later).
To get help with a package (or a function in a package) you can type ?packagename into the console.
Assigning Variables:
x <- 2+2
y <- 6
Calculations:
x/2
## [1] 2
x*2
## [1] 8
x+y
## [1] 10
Vectors:
#c() function: concatenate
vector1 <- c(1,2,9,15,1000)
Referencing Elements of a Vector:
vector1[1]
## [1] 1
Functions:
mean(vector1)
## [1] 205.4
If you are ever unsure about a function, you can type ?functionname into the console. In this case, ?mean.
For now, we will mostly be working with .csv and .xls files. Later in the course, we may discuss other types of files.
From a file on your computer:
From a package:
library("openintro")
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
cars
## type price mpgCity driveTrain passengers weight
## 1 small 15.9 25 front 5 2705
## 2 midsize 33.9 18 front 5 3560
## 3 midsize 37.7 19 front 6 3405
## 4 midsize 30.0 22 rear 4 3640
## 5 midsize 15.7 22 front 6 2880
## 6 large 20.8 19 front 6 3470
## 7 large 23.7 16 rear 6 4105
## 8 midsize 26.3 19 front 5 3495
## 9 large 34.7 16 front 6 3620
## 10 midsize 40.1 16 front 5 3935
## 11 midsize 15.9 21 front 6 3195
## 12 large 18.8 17 rear 6 3910
## 13 large 18.4 20 front 6 3515
## 14 large 29.5 20 front 6 3570
## 15 small 9.2 29 front 5 2270
## 16 small 11.3 23 front 5 2670
## 17 midsize 15.6 21 front 6 3080
## 18 small 12.2 29 front 5 2295
## 19 large 19.3 20 front 6 3490
## 20 small 7.4 31 front 4 1845
## 21 small 10.1 23 front 5 2530
## 22 midsize 20.2 21 front 5 3325
## 23 large 20.9 18 rear 6 3950
## 24 small 8.4 46 front 4 1695
## 25 small 12.1 42 front 4 2350
## 26 small 8.0 29 front 5 2345
## 27 small 10.0 22 front 5 2620
## 28 midsize 13.9 20 front 5 2885
## 29 midsize 47.9 17 rear 5 4000
## 30 midsize 28.0 18 front 5 3510
## 31 midsize 35.2 18 rear 4 3515
## 32 midsize 34.3 17 front 6 3695
## 33 large 36.1 18 rear 6 4055
## 34 small 8.3 29 front 4 2325
## 35 small 11.6 28 front 5 2440
## 36 midsize 61.9 19 rear 5 3525
## 37 midsize 14.9 19 rear 5 3610
## 38 small 10.3 29 front 5 2295
## 39 midsize 26.1 18 front 5 3730
## 40 small 11.8 29 front 5 2545
## 41 midsize 21.5 21 front 5 3200
## 42 midsize 16.3 23 front 5 2890
## 43 large 20.7 19 front 6 3470
## 44 small 9.0 31 front 4 2350
## 45 midsize 18.5 19 front 5 3450
## 46 large 24.4 19 front 6 3495
## 47 small 11.1 28 front 5 2495
## 48 small 8.4 33 4WD 4 2045
## 49 small 10.9 25 4WD 5 2490
## 50 small 8.6 39 front 4 1965
## 51 small 9.8 32 front 5 2055
## 52 midsize 18.2 22 front 5 3030
## 53 small 9.1 25 front 4 2240
## 54 midsize 26.7 20 front 5 3245
Make sure the file is saved in the same folder as your .Rmd file.
NYCairbnb <- read.csv("NYCairbnb2019.csv")
Assessing Size:
dim(NYCairbnb)
## [1] 48895 16
Names:
names(NYCairbnb)
## [1] "id" "name"
## [3] "host_id" "host_name"
## [5] "neighbourhood_group" "neighbourhood"
## [7] "latitude" "longitude"
## [9] "room_type" "price"
## [11] "minimum_nights" "number_of_reviews"
## [13] "last_review" "reviews_per_month"
## [15] "calculated_host_listings_count" "availability_365"
Referencing values:
You can reference a particular row and/or column of a dataset by using dataset[row,column]. For example, if I wanted to know the value in the 1st row, 3rd column in the NYCairbnb dataset, I would use the command
NYCairbnb[1,3]
## [1] 2787
Referencing Columns:
NYCairbnb$price
NYCairbnb[,"price"]
NYCairbnb[,10]
attach(NYCairbnb)
price
Calculations:
mean(NYCairbnb$price)
## [1] 152.7207
sd(NYCairbnb$price)
## [1] 240.1542
Conditional Subsetting:
#prints out all the rows where the price per night is more than $8000 per night
NYCairbnb[NYCairbnb$price >=8000,]
## id name host_id
## 4378 2953058 Film Location 1177497
## 6531 4737930 Spanish Harlem Apt 1235070
## 9152 7003697 Furnished room in Astoria apartment 20582832
## 12343 9528920 Quiet, Clean, Lit @ LES & Chinatown 3906464
## 17693 13894339 Luxury 1 bedroom apt. -stunning Manhattan views 5143901
## 29239 22436899 1-BR Lincoln Center 72390391
## 30269 23377410 Beautiful/Spacious 1 bed luxury flat-TriBeCa/Soho 18128455
## 40434 31340283 2br - The Heart of NYC: Manhattans Lower East Side 4382127
## host_name neighbourhood_group neighbourhood latitude longitude
## 4378 Jessica Brooklyn Clinton Hill 40.69137 -73.96723
## 6531 Olson Manhattan East Harlem 40.79264 -73.93898
## 9152 Kathrine Queens Astoria 40.76810 -73.91651
## 12343 Amy Manhattan Lower East Side 40.71355 -73.98507
## 17693 Erin Brooklyn Greenpoint 40.73260 -73.95739
## 29239 Jelena Manhattan Upper West Side 40.77213 -73.98665
## 30269 Rum Manhattan Tribeca 40.72197 -74.00633
## 40434 Matt Manhattan Lower East Side 40.71980 -73.98566
## room_type price minimum_nights number_of_reviews last_review
## 4378 Entire home/apt 8000 1 1 2016-09-15
## 6531 Entire home/apt 9999 5 1 2015-01-02
## 9152 Private room 10000 100 2 2016-02-13
## 12343 Private room 9999 99 6 2016-01-01
## 17693 Entire home/apt 10000 5 5 2017-07-27
## 29239 Entire home/apt 10000 30 0
## 30269 Entire home/apt 8500 30 2 2018-09-18
## 40434 Entire home/apt 9999 30 0
## reviews_per_month calculated_host_listings_count availability_365
## 4378 0.03 11 365
## 6531 0.02 1 0
## 9152 0.04 1 0
## 12343 0.14 1 83
## 17693 0.16 1 0
## 29239 NA 1 83
## 30269 0.18 1 251
## 40434 NA 1 365
NYCairbnb[NYCairbnb$price >=8000, c("host_name","neighbourhood", "room_type","price")]
## host_name neighbourhood room_type price
## 4378 Jessica Clinton Hill Entire home/apt 8000
## 6531 Olson East Harlem Entire home/apt 9999
## 9152 Kathrine Astoria Private room 10000
## 12343 Amy Lower East Side Private room 9999
## 17693 Erin Greenpoint Entire home/apt 10000
## 29239 Jelena Upper West Side Entire home/apt 10000
## 30269 Rum Tribeca Entire home/apt 8500
## 40434 Matt Lower East Side Entire home/apt 9999
# before a line of comment)When naming variables, observations, data frames, or files, make them:
Other naming considerations:
filter or mean)surface_temp= surface temperature measurement on Mars in degrees Celsius)Some suggestions for best practices:
purple vs. Purple vs. purple_)NA, NaN, -9999, -); don’t leave cells blankby @alisonhorst